Automatic and Human Evaluation Study of a Rule-based and a Statistical Catalan-Spanish Machine Translation Systems

نویسندگان

  • Marta R. Costa-Jussà
  • Mireia Farrús
  • José B. Mariño
  • José A. R. Fonollosa
چکیده

Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto-Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Study and Comparison of Rule-Based and Statistical Catalan-Spanish Machine Translation Systems

Machine translation systems can be classified into rule-based and corpusbased approaches, in terms of their core methodology. Since both paradigms have been largely used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of several specific Catalan-Spanish ma...

متن کامل

Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge

In this paper, a human evaluation of a Catalan-Spanish Ngram-based statistical machine translation system is used to develop specific techniques based on the use of grammatical categories, lexical categorisation and text processing, for the enhancement of the final translation. The system is successfully improved when testing with ad hoc and general corpora, as it is shown in the final automati...

متن کامل

OpenMT: Open Source Machine Translation Using Hybrid Methods

The main goal of the OpenMT project is the development of open source machine translation architectures based on hybrid models and advanced syntactic–semantic processors. These architectures combine the three main Machine Translation (MT) frameworks, Rule-based (RBMT), Statistical (SMT) and Example–based (EBMT), into hybrid systems. Defined architectures and results will be open source, allow f...

متن کامل

The TCH machine translation system for IWSLT 2008

This paper reports on the first participation of TCH (Toshiba (China) Research and Development Center) at the IWSLT evaluation campaign. We participated in all the 5 translation tasks with Chinese as source language or target language. For Chinese-English and English-Chinese translation, we used hybrid systems that combine rule-based machine translation (RBMT) method and statistical machine tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010